Towards conversational speech synthesis; lessons learned from the expressive speech processing project
نویسنده
چکیده
This paper discusses some ideas for the requirements and methods of conversational speech synthesis, based on experience gained from the collection and analysis of a very large corpus of conversational speech in a variety of real-life everyday contexts. It shows that because variation in voice quality plays a significant part in the transmission of interpersonal and affect-related social information, this feature should be given priority in future speech synthesis research. Several solutions to this problem are proposed.
منابع مشابه
A 3d audio-visual animated agent for expressive conversational question answering
This paper reports on the ACQA (Animated agent for Conversational Question Answering) project conducted at LIMSI. The aim is to design an expressive animated conversational agent (ACA) for conducting research along two main lines: 1/ perceptual experiments (eg perception of expressivity and 3D movements in both audio and visual channels): 2/ design of human-computer interfaces requiring head mo...
متن کاملWhich resemblance is useful to predict phrase boundary rise labels for Japanese expressive text-to-speech synthesis, numerically-expressed stylistic or distribution-based semantic?
To establish Expressive Text-to-speech synthesis, current research studies both the processing of input text and the rendering of natural expressive speech. Focusing on the former as a front-end task in the production of synthetic speech, this paper investigates a novel feature for predicting phrase boundary tone labels which transcribe local fundamental frequency (F0) changes frequently appear...
متن کاملApplication of expressive TTS synthesis in an advanced ECA system
The research project COMPANIONS aims at developing an advanced embodied conversational agent (ECA). This ECA is used in two scenarios and two languages (English and Czech), and it requires a TTS system being able to generate very natural expressive and emotional speech output. This paper describes application issues of two such systems within the ECA, introduces approaches to expressive speech ...
متن کاملEvaluating expressive speech synthesis from audiobooks in conversational phrases
CNGL, School of Computer Science and Informatics, University College Dublin Dublin, Ireland {eva.szekely|mohamed.abou-zleikha}@ucdconnect.ie, {joao.cabral|peter.cahill|julie.berndsen}@ucd.ie Abstract Audiobooks are a rich resource of large quantities of natural sounding, highly expressive speech. In our previous research we have shown that it is possible to detect different expressive voice sty...
متن کاملIncremental Coordination: Attention-Centric Speech Production in a Physically Situated Conversational Agent
Inspired by studies of human-human conversations, we present methods for incrementally coordinating speech production with listeners’ visual foci of attention. We introduce a model that considers the demands and availability of listeners’ attention at the onset and throughout the production of system utterances, and that incrementally coordinates speech synthesis with the listener’s gaze. We pr...
متن کامل